Practical Bayesian Modeling with Stan and brms

Author

Masatoshi Katabuchi

Published

October 31, 2025

1 Docker (optional)

Run this on your terminal to start an RStudio Server with all the necessary packages installed.

docker run -d -p 8787:8787 \
  -e PASSWORD=123456  \
  mattocci/afec-bayes:prod

Access 127.0.0.1:8787 and log in with the username rstudio and the password 123456 (or your own password).

Note: Because you’re running this container on your own machine and mapping only to 127.0.0.1:8787, the RStudio Server isn’t accessible from outside your computer. So, using a simple password like 123456 is fine and safe for local development.

2 Setup

library(brms)
library(cmdstanr)
library(tidyverse)
library(bayesplot)
library(patchwork)
# cmdstanr::set_cmdstan_path("/opt/cmdstan/cmdstan-2.37.0")

my_theme <- theme_bw() + theme(
    axis.text = element_text(size = 14),
    strip.text = element_text(size = 14)
  )

theme_set(my_theme)
library(tictoc)

3 Normal model

Leaf mass per area (LMA) is an important trait that reflects plant strategies. LMA for interspecific data often shows a log-normal distribution.

\[ \text{log}(y_i) \sim N(\mu, \sigma) \]

Q: What are the mean (\(\mu\)) and standard deviation (\(\sigma\)) of log LMA?

3.1 Dummy data

set.seed(123)
n <- 100
mu <- 4.6
sigma <- 0.7
y <- rnorm(n, mean = mu, sd = sigma)

hist(y)

cmdstan uses list format for data input.

normal_list <- list(
  y = y,
  N = length(y)
)

brms uses dataframe (tibble) format for data input like lme4::lmer and stats::glm.

normal_df <- tibble(
  y = y,
  N = length(y)
)

3.2 CmdStan

We need to write and save Stan programs (.stan files).

stan/normal.stan

 // observed data
data {
  int<lower=0> N;
  vector[N] y;
}

// parameters to be estimated
parameters {
  real mu;
  real<lower=0> sigma;
}

model {
  // likelihood
  y ~ normal(mu, sigma);
  // priors
  mu ~ normal(0, 10);
  sigma ~ cauchy(0, 2.5);
} 

stan/normal_vague.stan

 data {
  int<lower=0> N;
  vector[N] y;
}

parameters {
  real mu;
  real<lower=0> sigma;
}

model {
  y ~ normal(mu, sigma);
  mu ~ normal(0, 1e+4); // Vague prior for mu
  sigma ~ normal(0, 1e+4); // Vague prior for sigma
} 

These commands compile the models, translating Stan code to C++ and then into machine-executable files. This takes a while (about 20 secs to 1 min) the first time only.

normal_mod <- cmdstan_model("stan/normal.stan")
normal_vague_mod <- cmdstan_model("stan/normal_vague.stan")
# Persist CSV outputs to avoid /tmp cleanup during render
normal_out_dir <- file.path("assets", "cmdstan", "normal")
if (!dir.exists(normal_out_dir)) dir.create(normal_out_dir, recursive = TRUE)

normal_fit <- normal_mod$sample(
  data = normal_list,
  seed = 123,
  chains = 4,
  parallel_chains = 4,
  iter_warmup = 1000,  # number of warmup iterations
  iter_sampling = 1000, # number of sampling iterations
  refresh = 500,        # print update every 500 iters
  output_dir = normal_out_dir
)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 1 finished in 0.0 seconds.
#> Chain 2 finished in 0.0 seconds.
#> Chain 3 finished in 0.0 seconds.
#> Chain 4 finished in 0.0 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.0 seconds.
#> Total execution time: 0.2 seconds.

We don’t have to recompile the model when we want to change the number of iterations, chains, or data.

normal_vague_fit <- normal_vague_mod$sample(
  data = normal_list,
  seed = 123,
  chains = 4,
  parallel_chains = 4,
  refresh = 500 # print update every 500 iters
)

3.2.1 Summary

We use this output when writing manuscripts and for model diagnostics.

normal_fit$summary()
#> # A tibble: 3 × 10
#>   variable   mean median     sd    mad     q5    q95  rhat ess_bulk ess_tail
#>   <chr>     <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
#> 1 lp__     -6.34  -6.03  1.00   0.721  -8.30  -5.39  1.00     1927.    2334.
#> 2 mu        4.66   4.66  0.0659 0.0651  4.55   4.77  1.000    3370.    2293.
#> 3 sigma     0.646  0.643 0.0456 0.0455  0.575  0.725 1.00     3819.    2828.
  • lp__: log posterior.
    • \(\text{log}\; p(\theta \mid \text{data}) \propto \text{log}\; p(\text{data} \mid \theta) + \text{log}\; p(\theta)\)
  • rhat (Gelman-Rubin statistic): should be close to 1.
    • Measures the convergence of the chains.
    • rhat > 1.1: Definitely bad.
    • 1.01 < rhat < 1.1: Suspicious.
    • rhat <= 1.01: Good.
  • ess_bulk and ess_tail: Effective sample size for bulk and tail of the posterior distribution.
    • ess_bulk: sampling efficiency for the bulk posteriors (e.g., mean, SD).
    • ess_tail: sampling efficiency for the tails (e.g., quantiles like 2.5%, 97.5%).
    • ess_bulk, ess_tail > 400: Good.

3.2.2 Draws (posterior samples)

For each parameter, we have 1000 iterations \(\times\) 4 chains = 4000 posteriors. We use this for visualization and diagnostics.

normal_fit$draws()
#> # A draws_array: 1000 iterations, 4 chains, and 3 variables
#> , , variable = lp__
#> 
#>          chain
#> iteration    1    2     3    4
#>         1 -5.4 -5.5 -10.3 -5.5
#>         2 -5.5 -6.8  -5.4 -5.7
#>         3 -5.5 -6.7  -6.7 -6.1
#>         4 -6.3 -6.1  -5.8 -6.8
#>         5 -6.1 -5.5  -5.7 -6.0
#> 
#> , , variable = mu
#> 
#>          chain
#> iteration   1   2   3   4
#>         1 4.6 4.6 4.5 4.7
#>         2 4.6 4.8 4.7 4.7
#>         3 4.6 4.8 4.6 4.6
#>         4 4.7 4.7 4.7 4.7
#>         5 4.7 4.6 4.7 4.6
#> 
#> , , variable = sigma
#> 
#>          chain
#> iteration    1    2    3    4
#>         1 0.65 0.66 0.63 0.63
#>         2 0.64 0.67 0.62 0.66
#>         3 0.63 0.64 0.70 0.60
#>         4 0.68 0.67 0.68 0.57
#>         5 0.67 0.65 0.60 0.59
#> 
#> # ... with 995 more iterations

Trace plots for diagnosing convergence and mixing of the chains.

normal_draws <- as_draws_df(normal_fit$draws())

color_scheme_set("viridis")
mcmc_trace(normal_draws, pars = c("mu", "sigma"))

Histograms of the posterior distributions of the parameters.

mcmc_hist(normal_draws, pars = c("mu", "sigma"))

3.2.3 Diagnostic summary

normal_fit$diagnostic_summary()
#> $num_divergent
#> [1] 0 0 0 0
#> 
#> $num_max_treedepth
#> [1] 0 0 0 0
#> 
#> $ebfmi
#> [1] 1.005079 1.074466 1.096794 1.190007
  • num_divergent: indicates the number of iterations (sampling transitions) where the Hamiltonian trajectory failed (divergent transition).
    • Even 1-2 divergent transitions suggest that model may not be reliable, especially in hierarchical models.
    • We need reparameterization or increase adapt_delta to make the sampler take smaller steps.
  • num_max_treedepth: indicates the number of iterations where there are not enough leapfrog steps.
    • This can indicate that the model is complex or that the priors are too tight.
    • We can increase max_treedepth to allow more steps, but this can increase the computation time.
  • ebfmi: Energy-Bayesian Fraction of Missing Information
    • Measures whether the resampled momentum generates enough variation in energy to explore the posterior efficiently.
    • Low ebfmi means the sampler may be stuck in tight regions and exploring poorly, even if Rhat and ESS look fine.
    • Guidelines:
      • ebfmi < 0.3: Bad
      • 0.3 < ebfmi <= 1: Acceptable
      • ebfmi >= 1: Good

3.2.4 Methods text example

“We estimated posterior distributions of all parameters using the Hamiltonian Monte Carlo (HMC) algorithm implemented in Stan (Carpenter et al., 2017) with weakly informative priors (Gelman et al., 2008). The HMC algorithm uses gradient information to propose new states in the Markov chain, leading to a more efficient exploration of the target distribution than traditional Markov chain Monte Carlo (MCMC) methods that rely on random proposals (Carpenter et al., 2017). This efficiency allows us to achieve convergence with fewer iterations than traditional MCMC methods. Four independent chains were run for 2000 iterations for each model with a warm-up of 1000 iterations. Convergence of the posterior distribution was assessed using the Gelman–Rubin statistic with a convergence threshold of 1.1 (Gelman et al., 2013), ensuring effective sample sizes greater than 400 (Vehtari et al., 2021), and by monitoring divergent transitions (Betancourt, 2016) for all parameters.”

3.3 brms

brms uses an lme4-like syntax.

normal_fit_brm0 <- brm(y ~ 1, family = gaussian(), data = normal_df)

We can also specify priors for the parameters. The prior argument can be a bit tricky, since it is not always obvious which parameter names correspond to which parts of the model, especially when the model becomes more complex.

We can also control the number of warmup iterations, total iterations, and other sampling settings.

The cmdstanr backend is generally faster than the default rstan backend.

normal_fit_brm <- brm(
  y ~ 1,
  family = gaussian(),
  data = normal_df,
  prior = c(
    prior(normal(0, 5), class = "Intercept"),
    prior(cauchy(0, 2.5), class = "sigma")
  ),
  seed = 123,
  iter = 2000, # total iterations (warmup + post-warmup)
  warmup = 1000,
  chains = 4, cores = 4,
  backend = "cmdstanr"
)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 1 finished in 0.0 seconds.
#> Chain 2 finished in 0.0 seconds.
#> Chain 3 finished in 0.0 seconds.
#> Chain 4 finished in 0.0 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.0 seconds.
#> Total execution time: 0.2 seconds.

3.3.1 Summary

summary(normal_fit_brm)
#>  Family: gaussian 
#>   Links: mu = identity 
#> Formula: y ~ 1 
#>    Data: normal_df (Number of observations: 100) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Regression Coefficients:
#>           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept     4.66      0.06     4.53     4.79 1.00     3017     2207
#> 
#> Further Distributional Parameters:
#>       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sigma     0.65      0.05     0.56     0.75 1.00     3322     2774
#> 
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

3.3.2 Draws (posterior samples)

normal_draws_brm <- posterior::as_draws_df(normal_fit_brm)
normal_draws_brm
#> # A draws_df: 1000 iterations, 4 chains, and 5 variables
#>    b_Intercept sigma Intercept lprior lp__
#> 1          4.8  0.63       4.8   -4.4 -102
#> 2          4.5  0.73       4.5   -4.4 -104
#> 3          4.5  0.76       4.5   -4.4 -105
#> 4          4.8  0.70       4.8   -4.4 -103
#> 5          4.8  0.61       4.8   -4.4 -104
#> 6          4.7  0.66       4.7   -4.4 -102
#> 7          4.6  0.63       4.6   -4.4 -102
#> 8          4.6  0.63       4.6   -4.4 -102
#> 9          4.7  0.69       4.7   -4.4 -103
#> 10         4.6  0.60       4.6   -4.4 -102
#> # ... with 3990 more draws
#> # ... hidden reserved variables {'.chain', '.iteration', '.draw'}
mcmc_trace(normal_draws_brm, pars = c("b_Intercept", "sigma"))
mcmc_hist(normal_draws_brm, pars = c("b_Intercept", "sigma"))

3.3.3 Diagnostic summary

rstan::check_hmc_diagnostics(normal_fit_brm$fit)
#> 
#> Divergences:
#> 
#> Tree depth:
#> 
#> Energy:

4 Poisson model

Count data (e.g., species richness, seed counts, pollinator visits…) often follow a Poisson distribution.

\[ y_i \sim \text{Poisson}(\lambda) \]

\[ y_i \sim \operatorname{Poisson\_log}(\log \lambda) \]

Q: What are the mean and variance (\(\lambda\)) of species richness per plot?

4.1 Dummy data

set.seed(123)
y <- rpois(n, lambda = 12.3)

pois_list <- list(
  y = y,
  N = n
)

pois_df <- tibble(
  y = y,
  N = n
)

4.2 CmdStan

stan/pois.stan

 data {
  int<lower=0> N;
  array[N] int<lower=0> y;
}

parameters {
  real<lower=0> lambda;
}

model {
  y ~ poisson(lambda);
  lambda ~ normal(0, 10);
} 

stan/pois_re.stan

 data {
  int<lower=0> N;
  array[N] int<lower=0> y;
}

parameters {
  real log_lambda;
}

model {
  y ~ poisson_log(log_lambda);
  log_lambda ~ normal(0, 2.5);
} 
pois_mod <- cmdstan_model("stan/pois.stan")
pois_re_mod <- cmdstan_model("stan/pois_re.stan")
pois_fit <- pois_mod$sample(
  data = pois_list,
  seed = 123,
  chains = 4,
  parallel_chains = 4,
  iter_warmup = 1000, # number of warmup iterations
  iter_sampling = 1000, # number of sampling iterations
  refresh = 0 # print update every 500 iters
)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 finished in 0.0 seconds.
#> Chain 2 finished in 0.0 seconds.
#> Chain 3 finished in 0.0 seconds.
#> Chain 4 finished in 0.0 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.0 seconds.
#> Total execution time: 0.2 seconds.
pois_re_fit <- pois_re_mod$sample(
  data = pois_list,
  seed = 123,
  chains = 4,
  parallel_chains = 4,
  iter_warmup = 1000, # number of warmup iterations
  iter_sampling = 1000, # number of sampling iterations
  refresh = 0 # print update every 500 iters
)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 finished in 0.0 seconds.
#> Chain 2 finished in 0.0 seconds.
#> Chain 3 finished in 0.0 seconds.
#> Chain 4 finished in 0.0 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.0 seconds.
#> Total execution time: 0.2 seconds.
pois_fit$summary()
#> # A tibble: 2 × 10
#>   variable   mean median    sd   mad     q5    q95  rhat ess_bulk ess_tail
#>   <chr>     <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
#> 1 lp__     1836.  1836.  0.710 0.305 1834.  1836.   1.00    1898.    2556.
#> 2 lambda     12.2   12.2 0.349 0.344   11.6   12.8  1.00    1370.    1871.
pois_re_fit$summary()
#> # A tibble: 2 × 10
#>   variable      mean  median     sd    mad     q5    q95  rhat ess_bulk ess_tail
#>   <chr>        <dbl>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
#> 1 lp__       1833.   1834.   0.714  0.324  1.83e3 1.83e3  1.00    2170.    2574.
#> 2 log_lambda    2.50    2.50 0.0287 0.0295 2.46e0 2.55e0  1.00    1367.    2197.

4.3 brms

pois_fit_brm <- brm(y ~ 1,
  family = poisson(),
  data = pois_df,
  refresh = 0,
  backend = "cmdstanr")
#> Running MCMC with 4 sequential chains...
#> 
#> Chain 1 finished in 0.0 seconds.
#> Chain 2 finished in 0.0 seconds.
#> Chain 3 finished in 0.0 seconds.
#> Chain 4 finished in 0.0 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.0 seconds.
#> Total execution time: 0.5 seconds.
summary(pois_fit_brm)
#>  Family: poisson 
#>   Links: mu = log 
#> Formula: y ~ 1 
#>    Data: pois_df (Number of observations: 100) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Regression Coefficients:
#>           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept     2.50      0.03     2.44     2.56 1.00     1650     1850
#> 
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

5 Linear model

The simplest multiple linear regression model is the following, with two predictors (\(x_1\) and \(x_2\)), two slopes (\(\beta_1\) and \(\beta_2\)) and intercept coefficient (\(\beta_0\)), and normally distributed noise (\(\sigma\)). This model can be written using standard regression notation as

\[ y_i \sim N(\mu_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i}, \sigma) \]

or

\[ y_i \sim N(\mu_i = \boldsymbol{\beta \cdot x_i}, \sigma) \]

where \(\boldsymbol{\beta}\) is a 1 \(\times\) 3 coefficient matrix (or row vector) and \(\boldsymbol{x_i}\) is a 3 \(\times\) 1 predictor matrix (including the intercept). We use \(\boldsymbol{x}\) (3 \(\times\) N) to compute the \(\mu\) vector with length \(N\).

Q: What are the posterior means and 95% credible intervals of the coefficients (\(\beta_0\), \(\beta_1\), \(\beta_2\))?

set.seed(123)
n <- 100
beta <- c(2, 1.2, -0.8)
x1 <- rnorm(n, mean = 0, sd = 1)
x2 <- rnorm(n, mean = 0, sd = 1)
sigma <- 0.4
y <- rnorm(n, mean = beta[1] + beta[2] * x1 + beta[3] * x2, sd = sigma)
lm_list <- list(
  y = y,
  N = length(y),
  x = rbind(1, x1, x2) # 3 x N matrix
)

lm_df <- tibble(
  y = y,
  N = length(y),
  x1 = x1,
  x2 = x2
)

5.1 CmdStan

stan/lm.stan

 data {
  int<lower=0> N;
  vector[N] y;
  matrix[3, N] x;
}

parameters {
  row_vector[3] beta;
  real<lower=0> sigma;
}

model {
  y ~ normal(beta * x, sigma);
  beta ~ normal(0, 2.5);
  sigma ~ cauchy(0, 2.5);
} 
lm_mod <- cmdstan_model("stan/lm.stan")

lm_fit <- lm_mod$sample(
  data = lm_list,
  seed = 123,
  chains = 4,
  parallel_chains = 4,
  iter_warmup = 1000, # number of warmup iterations
  iter_sampling = 1000, # number of sampling iterations
  refresh = 0 # don't print update
)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 finished in 0.0 seconds.
#> Chain 2 finished in 0.0 seconds.
#> Chain 3 finished in 0.0 seconds.
#> Chain 4 finished in 0.0 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.0 seconds.
#> Total execution time: 0.2 seconds.

Use posterior::summarise_draws to check more detailed summary statistics.

lm_summary <- posterior::summarise_draws(
  lm_fit$draws(),
  mean = ~mean(.x),
  sd = ~sd(.x),
  ~posterior::quantile2(.x, probs = c(0.025, 0.05, 0.25, 0.5, 0.75, 0.95, 0.975))
)

lm_summary
#> # A tibble: 5 × 10
#>   variable   mean     sd   q2.5     q5    q25    q50    q75    q95  q97.5
#>   <chr>     <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#> 1 lp__     44.6   1.48   40.8   41.6   43.8   44.9   45.7   46.3   46.4  
#> 2 beta[1]   2.05  0.0390  1.98   1.99   2.03   2.05   2.08   2.12   2.13 
#> 3 beta[2]   1.15  0.0430  1.06   1.08   1.12   1.15   1.17   1.22   1.23 
#> 4 beta[3]  -0.790 0.0400 -0.870 -0.856 -0.816 -0.789 -0.763 -0.726 -0.715
#> 5 sigma     0.386 0.0292  0.334  0.342  0.365  0.384  0.405  0.437  0.448

5.2 brms

lm_fit_brm <- brm(
  y ~ x1 + x2,
  family = gaussian(),
  data = lm_df,
  prior = c(
    prior(normal(0, 2.5), class = "Intercept"),
    prior(normal(0, 2.5), class = "b"),
    prior(cauchy(0, 2.5), class = "sigma")
  ),
  refresh = 0, # don't print update
  backend = "cmdstanr"
)
#> Running MCMC with 4 sequential chains...
#> 
#> Chain 1 finished in 0.0 seconds.
#> Chain 2 finished in 0.0 seconds.
#> Chain 3 finished in 0.0 seconds.
#> Chain 4 finished in 0.0 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.0 seconds.
#> Total execution time: 0.5 seconds.

summary(lm_fit_brm)
#>  Family: gaussian 
#>   Links: mu = identity 
#> Formula: y ~ x1 + x2 
#>    Data: lm_df (Number of observations: 100) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Regression Coefficients:
#>           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept     2.05      0.04     1.97     2.13 1.00     4347     3260
#> x1            1.15      0.04     1.06     1.23 1.00     3995     2774
#> x2           -0.79      0.04    -0.87    -0.71 1.00     4302     2884
#> 
#> Further Distributional Parameters:
#>       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sigma     0.39      0.03     0.34     0.45 1.00     4241     2851
#> 
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
lm_summary_brm <- posterior::summarise_draws(
  posterior::as_draws_df(lm_fit_brm),
  mean = ~mean(.x),
  sd = ~sd(.x),
  ~posterior::quantile2(.x, probs = c(0.025, 0.05, 0.25, 0.5, 0.75, 0.95, 0.975))
)

lm_summary_brm
#> # A tibble: 7 × 10
#>   variable       mean     sd    q2.5      q5     q25     q50     q75     q95
#>   <chr>         <dbl>  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#> 1 b_Intercept   2.05  0.0394   1.97    1.99    2.03    2.05    2.08    2.12 
#> 2 b_x1          1.15  0.0427   1.06    1.08    1.12    1.15    1.18    1.22 
#> 3 b_x2         -0.790 0.0404  -0.868  -0.855  -0.816  -0.790  -0.762  -0.723
#> 4 sigma         0.386 0.0286   0.336   0.342   0.366   0.385   0.403   0.435
#> 5 Intercept     2.24  0.0391   2.17    2.18    2.22    2.24    2.27    2.31 
#> 6 lprior       -7.46  0.0172  -7.49   -7.48   -7.47   -7.45   -7.44   -7.43 
#> 7 lp__        -54.3   1.51   -58.1   -57.2   -55.0   -53.9   -53.1   -52.5  
#> # ℹ 1 more variable: q97.5 <dbl>

6 Varying intercepts

H: There is a power-law relationship (\(y =\beta_0x^{\beta_1}\)) between tree diameter (DBH) and tree maximum height, and the scaling factor \(\beta_0\) varies among species.

\[ \text{log}\; y_{ij} \sim N(\text{log}\; \beta_{0j} + \beta_1 x_{i}, \sigma) \]

\[ \text{log}\; \beta_{0j} \sim N(\mu_0, \tau) \]

\[ \mu_0 \sim N(0, 2.5) \]

\[ \tau \sim \text{Half-Cauchy}(0, 1) \]

Q: What are the species-level scaling factors (\(\beta_{0j}\)) and the global scaling exponent (\(\beta_1\))?

6.1 Dummy data

  • We simulate a dataset with 10 species and 20 trees per species.
  • Each species has wood density (wd) values.
set.seed(12345)

n_sp <- 10
n_rep <- 20
wd <- rnorm(n_sp, 0, 1)
gamma0 <- 0.6
gamma1 <- 0.1
sigma_y <- 0.1

b1_hat <- gamma1 * wd + gamma0
b1 <- rnorm(n_sp, b1_hat, 0.01)
log_b0 <- rnorm(n_sp, 0.55, 0.05)

# ---- simulate ----
allo_df <- tibble(
  sp     = factor(rep(paste0("sp", 1:n_sp), each = n_rep)),
  wd  = rep(wd, each = n_rep),
  # now log_xx ~ Normal(mean log-dbh, sd = 0.5)
  log_xx = rnorm(n_sp * n_rep, mean = log(40), sd = 0.5)) |>
  mutate(
    # add observation-level noise on log-height
    log_y = rnorm(
      n(),
      log_b0[as.integer(sp)] + b1[as.integer(sp)] * log_xx,
      sigma_y),
    dbh = exp(log_xx),
    h = exp(log_y)) |>
  select(sp, wd, dbh, h)

dbh_hist <- allo_df |>
  ggplot(aes(dbh)) +
  geom_histogram() +
  xlab("DBH (cm)") +
  theme(
    legend.position = "none",
    plot.title = element_text(size = 24)
    )

h_hist <- allo_df |>
  ggplot(aes(h)) +
  geom_histogram() +
  xlab("Height (m)") +
  theme(
    legend.position = "none",
    plot.title = element_text(size = 24)
    )

p1 <- allo_df |>
  ggplot(aes(x = dbh, y = h, col = sp)) +
  geom_point() +
  xlab("DBH (cm)") +
  ylab(expression("Height (m)")) +
  theme(
    legend.position = "none",
    plot.title = element_text(size = 24)
    )

p2 <- p1 +
  scale_x_log10() +
  scale_y_log10()


dbh_hist + h_hist + p1 + p2

allo_list <- list(
  log_h = log(allo_df$h),
  log_dbh = log(allo_df$dbh),
  sp = as.integer(allo_df$sp),
  N = nrow(allo_df),
  J = allo_df$sp |> unique() |> length()
)

6.2 CmdStan

stan/vint.stan

 data {
  int<lower=1> N;               // total number of observations
  int<lower=1> J;               // number of species
  array[N] int<lower=1, upper=J> sp; // sp index
  vector[N] log_dbh;            // predictor: log(DBH)
  vector[N] log_h;              // response: log(height)
}

parameters {
  real        mu0;         // grand mean of log(mu₀)
  real<lower=0> tau;            // SD of species intercepts
  vector[J]   log_beta0;            // species‐level intercepts = log(β₀ⱼ)
  real        beta1;            // common slope on log(DBH)
  real<lower=0> sigma;          // residual SD
}

model {
  // Likelihood
  log_h ~ normal(log_beta0[sp] + beta1 * log_dbh, sigma);

  // Priors
  mu0 ~ normal(0, 2.5);
  tau      ~ cauchy(0, 1);
  log_beta0  ~ normal(mu0, tau);
  beta1    ~ normal(0, 2.5);
  sigma    ~ cauchy(0, 1);
}

// we need this for model selection
generated quantities {
  vector[N] log_lik;
  for (i in 1:N)
    log_lik[i] = normal_lpdf(log_h[i]
                    | log_beta0[sp[i]] + beta1 * log_dbh[i],
                      sigma);
} 
vint_mod <- cmdstan_model("stan/vint.stan")

vint_fit <- vint_mod$sample(
  data = allo_list,
  seed = 123,
  chains = 4,
  parallel_chains = 4,
  iter_warmup = 1000, # number of warmup iterations
  iter_sampling = 1000, # number of sampling iterations
  refresh = 0 # don't print update
)
#> Running MCMC with 4 parallel chains...
#> Chain 1 finished in 0.2 seconds.
#> Chain 2 finished in 0.2 seconds.
#> Chain 3 finished in 0.2 seconds.
#> Chain 4 finished in 0.2 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.2 seconds.
#> Total execution time: 0.3 seconds.

vint_fit$summary()
#> # A tibble: 215 × 10
#>    variable         mean   median     sd    mad      q5      q95  rhat ess_bulk
#>    <chr>           <dbl>    <dbl>  <dbl>  <dbl>   <dbl>    <dbl> <dbl>    <dbl>
#>  1 lp__         365.     366.     2.69   2.59   360.    369.      1.00    1185.
#>  2 mu0            0.483    0.484  0.119  0.114    0.285   0.679   1.00    1081.
#>  3 tau            0.332    0.314  0.0912 0.0781   0.221   0.503   1.00    1508.
#>  4 log_beta0[1]   0.782    0.782  0.0565 0.0569   0.689   0.875   1.01     416.
#>  5 log_beta0[2]   0.921    0.923  0.0544 0.0542   0.832   1.01    1.01     419.
#>  6 log_beta0[3]   0.440    0.442  0.0555 0.0564   0.349   0.531   1.01     413.
#>  7 log_beta0[4]   0.265    0.266  0.0580 0.0597   0.171   0.359   1.01     393.
#>  8 log_beta0[5]   0.620    0.621  0.0538 0.0535   0.531   0.708   1.01     420.
#>  9 log_beta0[6]  -0.0154  -0.0146 0.0545 0.0550  -0.105   0.0733  1.01     432.
#> 10 log_beta0[7]   0.710    0.712  0.0545 0.0546   0.621   0.798   1.01     420.
#> # ℹ 205 more rows
#> # ℹ 1 more variable: ess_tail <dbl>
vint_fit$diagnostic_summary()
#> $num_divergent
#> [1] 0 0 0 0
#> 
#> $num_max_treedepth
#> [1] 0 0 0 0
#> 
#> $ebfmi
#> [1] 1.0896007 0.9152977 1.0290832 1.0009125

6.3 brms

priors <- c(
  prior(normal(0, 2.5), class = Intercept),
  prior(normal(0, 2.5),   class = b),
  prior(cauchy(0, 1),   class = sd),
  prior(cauchy(0, 1),   class = sigma)
)

vint_fit_brm <- brm(
  log(h) ~ log(dbh) + (1 | sp),
  family = gaussian(),
  data = allo_df,
  prior = priors,
  seed = 123,
  refresh = 0, # don't print update
  backend = "cmdstanr"
)
#> Running MCMC with 4 sequential chains...
#> 
#> Chain 1 finished in 0.5 seconds.
#> Chain 2 finished in 0.5 seconds.
#> Chain 3 finished in 0.5 seconds.
#> Chain 4 finished in 0.4 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.5 seconds.
#> Total execution time: 2.3 seconds.

summary(vint_fit_brm)
#>  Family: gaussian 
#>   Links: mu = identity 
#> Formula: log(h) ~ log(dbh) + (1 | sp) 
#>    Data: allo_df (Number of observations: 200) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Multilevel Hyperparameters:
#> ~sp (Number of levels: 10) 
#>               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sd(Intercept)     0.33      0.09     0.21     0.55 1.01      714     1210
#> 
#> Regression Coefficients:
#>           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept     0.47      0.12     0.23     0.70 1.00      813     1101
#> logdbh        0.61      0.01     0.59     0.64 1.00     2035     1824
#> 
#> Further Distributional Parameters:
#>       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sigma     0.10      0.01     0.09     0.11 1.00     2004     2100
#> 
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
plot(vint_fit_brm)

6.4 lme4

vint_fit_lme <- lme4::lmer(
  log(h) ~ log(dbh) + (1 | sp),
  data = allo_df)

summary(vint_fit_lme)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: log(h) ~ log(dbh) + (1 | sp)
#>    Data: allo_df
#> 
#> REML criterion at convergence: -298.3
#> 
#> Scaled residuals: 
#>      Min       1Q   Median       3Q      Max 
#> -2.66966 -0.62316  0.04254  0.60122  2.28315 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  sp       (Intercept) 0.084434 0.29057 
#>  Residual             0.009794 0.09897 
#> Number of obs: 200, groups:  sp, 10
#> 
#> Fixed effects:
#>             Estimate Std. Error t value
#> (Intercept)  0.47916    0.10454   4.584
#> log(dbh)     0.61154    0.01314  46.525
#> 
#> Correlation of Fixed Effects:
#>          (Intr)
#> log(dbh) -0.472

6.5 Visualization and predictions

6.5.1 CmdStan

# Predicted DBH-height curves per species with 95% intervals
pred_grid <- allo_df |>
  dplyr::group_by(sp) |>
  dplyr::summarise(
    dbh = list(seq(min(dbh), max(dbh), length.out = 60)),
    .groups = "drop"
  ) |>
  tidyr::unnest(dbh)

cmd_draws <- posterior::as_draws_matrix(vint_fit$draws(c("beta1", "log_beta0")))
beta1_draws <- cmd_draws[, "beta1"]
log_beta0_cols <- grep("^log_beta0\\[", colnames(cmd_draws), value = TRUE)
species_levels <- levels(allo_df$sp)

DBH for predicted lines for each species.

pred_grid
#> # A tibble: 600 × 2
#>    sp      dbh
#>    <fct> <dbl>
#>  1 sp1    12.2
#>  2 sp1    14.0
#>  3 sp1    15.8
#>  4 sp1    17.6
#>  5 sp1    19.5
#>  6 sp1    21.3
#>  7 sp1    23.1
#>  8 sp1    25.0
#>  9 sp1    26.8
#> 10 sp1    28.6
#> # ℹ 590 more rows

Posterior draws of the slope parameter (\(\beta_1\)).

str(beta1_draws)
#>  'draws_matrix' num [1:4000, 1] 0.638 0.635 0.62 0.627 0.629 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ draw    : chr [1:4000] "1" "2" "3" "4" ...
#>   ..$ variable: chr "beta1"
#>  - attr(*, "nchains")= int 4

Column names for log-transformed species-level intercepts (\(\log \beta_{0j}\)). We use this to extract posterior draws for each species.

str(log_beta0_cols)
#>  chr [1:10] "log_beta0[1]" "log_beta0[2]" "log_beta0[3]" "log_beta0[4]" ...
pred_cmd_summary <- dplyr::bind_cols(
  pred_grid,
  purrr::map2_dfr(
    match(pred_grid$sp, species_levels),
    log(pred_grid$dbh),
    ~{
      log_mu <- cmd_draws[, log_beta0_cols[.x]] + beta1_draws * .y
      height_draws <- exp(log_mu)
      tibble::tibble(
        estimate = stats::median(height_draws),
        lower = stats::quantile(height_draws, 0.025),
        upper = stats::quantile(height_draws, 0.975)
      )
    }
  )
)

Compute predicted height summary for each species and DBH.

pred_cmd_summary
#> # A tibble: 600 × 5
#>    sp      dbh estimate lower upper
#>    <fct> <dbl>    <dbl> <dbl> <dbl>
#>  1 sp1    12.2     10.1  9.53  10.6
#>  2 sp1    14.0     11.0 10.4   11.6
#>  3 sp1    15.8     11.8 11.2   12.4
#>  4 sp1    17.6     12.6 12.0   13.3
#>  5 sp1    19.5     13.4 12.8   14.1
#>  6 sp1    21.3     14.2 13.5   14.9
#>  7 sp1    23.1     14.9 14.2   15.6
#>  8 sp1    25.0     15.6 14.9   16.3
#>  9 sp1    26.8     16.3 15.6   17.1
#> 10 sp1    28.6     17.0 16.2   17.8
#> # ℹ 590 more rows
ggplot(pred_cmd_summary, aes(dbh, estimate, colour = sp)) +
  geom_ribbon(aes(ymin = lower, ymax = upper, fill = sp), alpha = 0.15, colour = NA) +
  geom_line() +
  geom_point(
    data = allo_df,
    aes(dbh, h, colour = sp),
    size = 1,
    alpha = 0.6,
    inherit.aes = FALSE
  ) +
  scale_x_log10() +
  scale_y_log10() +
  guides(fill = "none") +
  labs(
    x = "DBH (cm)",
    y = "Height (m)",
    colour = "Species"
  )

6.5.2 brms

With fitted() available, preparing pred_summary is simpler than in the CmdStan workflow.

pred_summary <- fitted(vint_fit_brm, newdata = pred_grid, re_formula = NULL) |>
  tibble::as_tibble() |>
  dplyr::bind_cols(pred_grid) |>
  dplyr::mutate(
    estimate = exp(Estimate),
    lower = exp(Q2.5),
    upper = exp(Q97.5)
  )

ggplot(pred_summary, aes(dbh, estimate, color = sp)) +
  geom_ribbon(aes(ymin = lower, ymax = upper, fill = sp), alpha = 0.15, colour = NA) +
  geom_line() +
  geom_point(
    data = allo_df,
    aes(dbh, h, colour = sp),
    size = 1,
    alpha = 0.6,
    inherit.aes = FALSE
  ) +
  scale_x_log10() +
  scale_y_log10() +
  guides(fill = "none") +
  labs(
    x = "DBH (cm)",
    y = "Height (m)",
    colour = "Species"
  )

7 Reparameterization for multilevel models

Diagnosing Biased Inference with Divergences by Michael Betancourt

7.1 Eight schools example

What is the effect of the treatment (a new education method) on students’ scores for each school and across schools?

school y sigma
School_1 28 15
School_2 8 10
School_3 -3 16
School_4 7 11
School_5 -1 9
School_6 1 11
School_7 18 10
School_8 12 18

7.2 Centered parameterization

\[ y_j \sim \mathcal{N}\bigl(\theta_j,\;\sigma_j\bigr) \]

\[ \theta_j \sim \mathcal{N}\bigl(\mu,\;\tau \bigr) \]

\[ \mu \sim \mathcal{N}\bigl(0,\;5 \bigr) \]

\[ \tau \sim \text{Half-Cauchy}(0, 2.5) \]

Note: \(\sigma_j\) is known constant (i.e., data), we don’t have to estimate it.

This parameterization is intuitive, but it often leads to convergence issues in hierarchical models.

7.3 Non-centered parameterization

\[ \tilde{\theta_j} \sim \mathcal{N}\bigl(0,\;1 \bigr) \]

\[ \theta_j = \mu + \tau \cdot \tilde{\theta_j} \]

In this parameterization, we introduce a new latent variable \(\tilde{\theta_j}\) that is independent of the other parameters. This allows the sampler to explore the posterior more efficiently and avoids problems with convergence. We often use this parameterization when we have hierarchical models with varying intercepts or slopes.

8 Varying slopes and intercepts

H: There is a power-law relationship (\(y =\beta_0 x^{\beta_1}\)) between tree diameter (DBH) and tree maximum height, and both the scaling factor \(\beta_0\) and the exponent \(\beta_1\) vary among species.

\[ \text{log}\; y_{ij} \sim N(\text{log}\; \beta_{0j} + \beta_{1j} x_{ij}, \sigma) \]

\[ \begin{bmatrix} \text{log}\; \beta_{0j} \\ \beta_{1j} \end{bmatrix} \sim \mathrm{MVN} \Biggl( \boldsymbol{\mu} = \begin{bmatrix} \mu_{0} \\ \mu_{1} \end{bmatrix}, \boldsymbol{\Sigma} = \begin{bmatrix} \sigma_{0}^2 & \rho \sigma_{0} \sigma_{1} \\ \rho \sigma_{0} \sigma_{1} & \sigma_{1}^2 \end{bmatrix} \Biggr) \]

For reference, the multivariate normal (MVN) distribution looks like this:

Q: What are the species-level scaling factors (\(\beta_{0j}\)) and scaling exponent (\(\beta_{0j}\))? What are the global scaling exponent (\(\mu_0\)) and scaling exponent (\(\mu_1\))?

8.1 Centered parameterization

\[ \boldsymbol{\mu} \sim N(0, 2.5) \]

\[ \boldsymbol{\Sigma} \sim \mathcal{W}^{-1}_p(\nu,\,\Psi) \]

The inverse-whishart distribution \(\mathcal{W}^{-1}\) is a conjugate for the covariance matrix \(\boldsymbol{\Sigma}\) in MVN. However, despite conjugacy, inverse–Wishart priors are often computationally difficult to sample.

Recommendation: use a Cholesky factorization with independent priors on scales and correlation (below).

8.2 Non-Centered parameterization

Optimization through Cholesky factorization

\[ \boldsymbol{z}_j \sim \mathcal{N}(\mathbf0,\,I_2) \] \[ \begin{pmatrix}\log\beta_{0j}\\[3pt]\beta_{1j}\end{pmatrix} = \boldsymbol{\mu} + L\, \boldsymbol{z}_j \]

Some linear algebra (Cholesky decomposition):

\[ \Sigma \;=\;L\,L^\top \;=\;\bigl[\mathrm{diag}(\tau)\,L_\Omega\bigr]\, \bigl[L_\Omega^\top\,\mathrm{diag}(\tau)\bigr] \;=\;\mathrm{diag}(\tau)\,\Omega\,\mathrm{diag}(\tau). \]

\[ L_\Omega \sim \mathrm{LKJ\_Corr\_Cholesky}(2) \quad \text{or} \quad \Omega \sim \mathrm{LKJ\_Corr}(2) \] \[ \tau_i \sim \mathrm{Half\text{-}Cauchy}(0,1) \]

Rather than sampling the covariance matrix \(\Sigma\) or correlation matrix \(\Omega\) directly, sampling the Cholesky factor \(L_\Omega\) is computationally more efficient and numerically stable.

In the LKJ() prior:

  • \(\eta = 1\) indicates a flat prior (uniform distribution) over correlation matrices.

  • \(\eta > 1\) indicates weaker correlations.

We usually use \(\eta = 2\) because we assume at least weak correlations among the parameters while still allowing moderate to strong correlations.

8.3 CmdStan

stan/vslope.stan

 data {
  int<lower=0> N;              // num individuals
  int<lower=1> K;              // num ind predictors
  int<lower=1> J;              // num groups
  array[N] int<lower=1, upper=J> jj;  // group for individual
  int<lower=1> L;              // number of group-level predictors
  matrix[N, K] x;              // individual predictors
  matrix[L, J] u;              // group predictors
  vector[N] y;                 // outcomes
}

parameters {
  matrix[K, L] gamma;          // coefficients across groups
  real<lower=0> sigma;         // prediction error scale
  vector<lower=0>[K] tau;      // prior scale
  matrix[K, J] z;
  cholesky_factor_corr[K] L_Omega;
}

transformed parameters {
  matrix[K, J] beta;
  // mu = gamma * u
  beta = gamma * u + diag_pre_multiply(tau, L_Omega) * z;
}

model {
  to_vector(z) ~ std_normal();
  // tau ~ cauchy(0, 2.5);
  tau ~ normal(0, 2.5);
  L_Omega ~ lkj_corr_cholesky(2);
  to_vector(gamma) ~ normal(0, 2.5);
  for (n in 1:N) {
    y[n] ~ normal(x[n, ] * beta[, jj[n]], sigma);
  }
} 
allo_vslope_list <- list(
  y = log(allo_df$h),
  x = cbind(1, log(allo_df$dbh)),
  jj = as.integer(allo_df$sp),
  N = nrow(allo_df),
  K = 2, # number of predictors (intercept + slope)
  J = allo_df$sp |> unique() |> length(),
  L = 1 # number of group-level predictors (intercept)
)
allo_vslope_list$u <- matrix(1, nrow = allo_vslope_list$L, ncol = allo_vslope_list$J)
vslope_mod <- cmdstan_model("stan/vslope.stan")

vslope_fit <- vslope_mod$sample(
  data = allo_vslope_list,
  seed = 1234,
  chains = 4,
  parallel_chains = 4,
  iter_warmup = 1000, # number of warmup iterations
  iter_sampling = 1000, # number of sampling iterations
  adapt_delta = 0.95, # increase adapt_delta to avoid divergent transitions
  max_treedepth = 15, # increase max_treedepth to avoid max treedepth errors
  refresh = 0, # don't print update
  show_messages = FALSE   # suppress Stan’s “Informational Message” output
)

# vslope_fit$summary()
vslope_summary <- posterior::summarise_draws(
  vslope_fit$draws(),
  mean = ~mean(.x),
  sd = ~sd(.x),
  ~posterior::quantile2(.x, probs = c(0.025, 0.05, 0.25, 0.5, 0.75, 0.95, 0.975))
)

vslope_summary
#> # A tibble: 50 × 10
#>    variable      mean      sd     q2.5       q5      q25     q50     q75     q95
#>    <chr>        <dbl>   <dbl>    <dbl>    <dbl>    <dbl>   <dbl>   <dbl>   <dbl>
#>  1 lp__       3.60e+2 4.50     3.50e+2  3.52e+2 357.     3.60e+2 3.63e+2 366.   
#>  2 gamma[1,1] 5.02e-1 0.0547   3.97e-1  4.13e-1   0.466  5.04e-1 5.37e-1   0.590
#>  3 gamma[2,1] 6.06e-1 0.0310   5.44e-1  5.56e-1   0.587  6.06e-1 6.25e-1   0.658
#>  4 sigma      9.24e-2 0.00469  8.37e-2  8.49e-2   0.0892 9.23e-2 9.55e-2   0.100
#>  5 tau[1]     6.96e-2 0.0553   2.93e-3  5.53e-3   0.0276 5.86e-2 9.79e-2   0.171
#>  6 tau[2]     8.84e-2 0.0280   4.88e-2  5.27e-2   0.0694 8.41e-2 1.02e-1   0.140
#>  7 z[1,1]     5.80e-1 0.900   -1.37e+0 -1.01e+0   0.0366 6.36e-1 1.19e+0   1.98 
#>  8 z[2,1]     6.58e-1 0.628   -7.80e-1 -4.61e-1   0.305  7.11e-1 1.06e+0   1.59 
#>  9 z[1,2]     1.61e-1 0.994   -1.86e+0 -1.52e+0  -0.515  1.87e-1 8.53e-1   1.72 
#> 10 z[2,2]     1.29e+0 0.667   -1.37e-1  1.73e-1   0.890  1.31e+0 1.72e+0   2.35 
#> # ℹ 40 more rows
#> # ℹ 1 more variable: q97.5 <dbl>
vslope_fit$diagnostic_summary()
#> $num_divergent
#> [1] 0 0 0 0
#> 
#> $num_max_treedepth
#> [1] 0 0 0 0
#> 
#> $ebfmi
#> [1] 0.8183985 0.7952493 0.8137258 0.8179199

8.4 brms

priors <- c(
  prior(normal(0, 2.5), class = Intercept),
  prior(normal(0, 2.5),   class = b),
  prior(lkj(2), class = cor),                # Ω ~ LKJ(2)
  prior(cauchy(0, 1),   class = sigma)
)

slope_fit_brm <- brm(
  log(h) ~ log(dbh) + (1 + log(dbh) | sp),
  family = gaussian(),
  data = allo_df,
  prior = priors,
  refresh = 0, # don't print update
  control = list(
    adapt_delta  = 0.95,  # default is 0.8
    max_treedepth = 15    # default is 10
  ),
  backend = "cmdstanr"
)
#> Running MCMC with 4 sequential chains...
#> 
#> Chain 1 finished in 3.2 seconds.
#> Chain 2 finished in 3.8 seconds.
#> Chain 3 finished in 3.7 seconds.
#> Chain 4 finished in 3.6 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 3.6 seconds.
#> Total execution time: 14.7 seconds.

Divergent transitions! We need to increase adapt_delta and max_treedepth to avoid divergent transitions. We may also need to change the priors. Reparameterization is less straightforward in brms than writing your own Stan code.

summary(slope_fit_brm)
#> Warning: There were 23 divergent transitions after warmup. Increasing
#> adapt_delta above 0.95 may help. See
#> http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#>  Family: gaussian 
#>   Links: mu = identity 
#> Formula: log(h) ~ log(dbh) + (1 + log(dbh) | sp) 
#>    Data: allo_df (Number of observations: 200) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Multilevel Hyperparameters:
#> ~sp (Number of levels: 10) 
#>                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
#> sd(Intercept)             0.07      0.06     0.00     0.21 1.00     2079
#> sd(logdbh)                0.09      0.03     0.05     0.16 1.00     1023
#> cor(Intercept,logdbh)     0.09      0.42    -0.72     0.83 1.00      806
#>                       Tail_ESS
#> sd(Intercept)             1896
#> sd(logdbh)                 532
#> cor(Intercept,logdbh)     1347
#> 
#> Regression Coefficients:
#>           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept     0.50      0.05     0.39     0.61 1.00     3102      773
#> logdbh        0.61      0.03     0.55     0.67 1.01     1079      389
#> 
#> Further Distributional Parameters:
#>       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sigma     0.09      0.00     0.08     0.10 1.00     4735     2977
#> 
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

8.5 lme4

slope_fit_lme <- lme4::lmer(
  log(h) ~ log(dbh) + (1 + log(dbh) | sp),
  data = allo_df)

Not convergent!

summary(slope_fit_lme)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: log(h) ~ log(dbh) + (1 + log(dbh) | sp)
#>    Data: allo_df
#> 
#> REML criterion at convergence: -327.2
#> 
#> Scaled residuals: 
#>      Min       1Q   Median       3Q      Max 
#> -3.14307 -0.63987  0.08714  0.61696  3.03735 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev. Corr
#>  sp       (Intercept) 0.001326 0.03641      
#>           log(dbh)    0.004607 0.06787  1.00
#>  Residual             0.008397 0.09164      
#> Number of obs: 200, groups:  sp, 10
#> 
#> Fixed effects:
#>             Estimate Std. Error t value
#> (Intercept)  0.49980    0.04759   10.50
#> log(dbh)     0.60637    0.02468   24.57
#> 
#> Correlation of Fixed Effects:
#>          (Intr)
#> log(dbh) -0.264
#> optimizer (nloptwrap) convergence code: 0 (OK)
#> boundary (singular) fit: see help('isSingular')

9 Varying slopes and intercepts, and group-level predictors

H: There is a power-law relationship (\(y =\beta_0 x^{\beta_1}\)) between tree diameter (DBH) and tree maximum height, and both the scaling factor \(\beta_0\) and the exponent \(\beta_1\) vary among species. Those species-level variations can be explained by wood density (group-level predictor: \(u\)) of each species.

\[ \text{log}\; y_{ij} \sim N(\text{log}\; \beta_{0j} + \beta_{1j} x_{ij}, \sigma) \]

\[ \begin{bmatrix} \text{log}\; \beta_{0j} \\ \beta_{1j} \end{bmatrix} \sim \mathrm{MVN} \Biggl( \underbrace{ \begin{bmatrix} \gamma_{0}^{(\beta_0)} + \gamma_{1}^{(\beta_0)}u_{j} \\ \gamma_{0}^{(\beta_1)} + \gamma_{1}^{(\beta_1)}u_{j} \end{bmatrix}}_{\text{mean depends on }u_j}, \begin{bmatrix} \sigma_{0}^2 & \rho \sigma_{0} \sigma_{1} \\ \rho \sigma_{0} \sigma_{1} & \sigma_{1}^2 \end{bmatrix} \Biggr) \]

Q: What are the coefficients for species-level regressions (\(\gamma_{0}^{(\beta_0)}\), \(\gamma_{1}^{(\beta_0}\), \(\gamma_{0}^{(\beta_1)}\), and \(\gamma_{1}^{(\beta_1}\))?

Now we have two group-level predictors (intercept and slope), and \(u\) includes the intercept and wood density. The rest of this section follows the setup from the varying slopes and intercepts section.

allo_vslope_list$L <- 2

allo_vslope_list$u <- rbind(
  rep(1, allo_vslope_list$J),
  trait
)
str(allo_vslope_list)
#> List of 8
#>  $ y : num [1:200] 3.29 3.66 3.7 3.58 3.19 ...
#>  $ x : num [1:200, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ jj: int [1:200] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ N : int 200
#>  $ K : num 2
#>  $ J : int 10
#>  $ L : num 2
#>  $ u : num [1:2, 1:10] 1 0.586 1 0.709 1 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:2] "" "trait"
#>   .. ..$ : NULL
vgrp_fit <- vslope_mod$sample(
  data = allo_vslope_list,
  seed = 1234,
  chains = 4,
  parallel_chains = 4,
  iter_warmup = 1000, # number of warmup iterations
  iter_sampling = 1000, # number of sampling iterations
  adapt_delta = 0.95, # increase adapt_delta to avoid divergent transitions
  max_treedepth = 15, # increase max_treedepth to avoid max treedepth errors
  refresh = 0 # don't print update
)
#> Running MCMC with 4 parallel chains...
#> Chain 4 finished in 4.0 seconds.
#> Chain 2 finished in 4.6 seconds.
#> Chain 1 finished in 4.7 seconds.
#> Chain 3 finished in 5.2 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 4.6 seconds.
#> Total execution time: 5.2 seconds.

vgrp_fit$summary()
#> # A tibble: 52 × 10
#>    variable       mean   median      sd     mad        q5     q95  rhat ess_bulk
#>    <chr>         <dbl>    <dbl>   <dbl>   <dbl>     <dbl>   <dbl> <dbl>    <dbl>
#>  1 lp__       358.     358.     4.51    4.36    350.      3.65e+2 1.00     1004.
#>  2 gamma[1,1]   0.503    0.504  0.0516  0.0498    0.420   5.87e-1 1.00     3069.
#>  3 gamma[2,1]   0.618    0.618  0.0148  0.0148    0.593   6.41e-1 1.00     2703.
#>  4 gamma[1,2]   0.0414   0.0410 0.0627  0.0605   -0.0630  1.46e-1 1.00     3220.
#>  5 gamma[2,2]   0.0805   0.0802 0.0183  0.0177    0.0512  1.10e-1 1.000    2617.
#>  6 sigma        0.0926   0.0924 0.00494 0.00483   0.0847  1.01e-1 1.00     5732.
#>  7 tau[1]       0.0641   0.0565 0.0459  0.0459    0.00552 1.47e-1 1.00     1564.
#>  8 tau[2]       0.0246   0.0235 0.0131  0.0117    0.00500 4.76e-2 1.00     1210.
#>  9 z[1,1]       0.409    0.430  0.814   0.773    -0.897   1.71e+0 1.00     3208.
#> 10 z[2,1]       0.171    0.221  0.760   0.715    -1.16    1.35e+0 1.000    2873.
#> # ℹ 42 more rows
#> # ℹ 1 more variable: ess_tail <dbl>
vgrp_fit$diagnostic_summary()
#> $num_divergent
#> [1] 1 0 0 0
#> 
#> $num_max_treedepth
#> [1] 0 0 0 0
#> 
#> $ebfmi
#> [1] 0.8973376 0.7535103 0.9068761 0.7740817

9.1 brms

priors <- c(
  prior(normal(0, 2.5), class = Intercept),
  prior(normal(0, 2.5),   class = b),
  prior(lkj(2), class = cor),                # Ω ~ LKJ(2)
  prior(cauchy(0, 1),   class = sigma)
)
vgrp_fit_brm <- brm(
  log(h) ~ log(dbh) * wd + (1 + log(dbh) | sp),
  family = gaussian(),
  data = allo_df,
  prior = priors,
  refresh = 0, # don't print update
  control = list(
    adapt_delta  = 0.95,  # default is 0.8
    max_treedepth = 15    # default is 10
  ),
  backend = "cmdstanr"
)
#> Running MCMC with 4 sequential chains...
#> 
#> Chain 1 finished in 4.2 seconds.
#> Chain 2 finished in 4.7 seconds.
#> Chain 3 finished in 4.5 seconds.
#> Chain 4 finished in 4.9 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 4.6 seconds.
#> Total execution time: 18.6 seconds.
summary(vgrp_fit_brm)
#> Warning: There were 5 divergent transitions after warmup. Increasing
#> adapt_delta above 0.95 may help. See
#> http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#>  Family: gaussian 
#>   Links: mu = identity 
#> Formula: log(h) ~ log(dbh) * wd + (1 + log(dbh) | sp) 
#>    Data: allo_df (Number of observations: 200) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Multilevel Hyperparameters:
#> ~sp (Number of levels: 10) 
#>                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
#> sd(Intercept)             0.09      0.07     0.00     0.25 1.00     1698
#> sd(logdbh)                0.08      0.03     0.04     0.15 1.00     1335
#> cor(Intercept,logdbh)     0.13      0.42    -0.69     0.85 1.01      865
#>                       Tail_ESS
#> sd(Intercept)             1867
#> sd(logdbh)                1909
#> cor(Intercept,logdbh)     1516
#> 
#> Regression Coefficients:
#>           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept     0.50      0.06     0.38     0.62 1.00     3835     2984
#> logdbh        0.60      0.03     0.54     0.66 1.00     1949     1867
#> wd            0.03      0.07    -0.11     0.18 1.00     3489     2388
#> logdbh:wd    -0.05      0.04    -0.12     0.03 1.00     2148     2121
#> 
#> Further Distributional Parameters:
#>       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sigma     0.09      0.00     0.08     0.10 1.00     4956     2552
#> 
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

9.2 lme4

grp_fit_lme <- lme4::lmer(
  log(h) ~ log(dbh) * wd + (1 + log(dbh) | sp),
  data = allo_df)

10 References

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.5.1 (2025-06-13)
#>  os       Ubuntu 24.04.3 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Etc/UTC
#>  date     2025-10-31
#>  pandoc   3.8.1 @ /usr/bin/ (via rmarkdown)
#>  quarto   1.7.32 @ /usr/local/bin/quarto
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  ! package        * version  date (UTC) lib source
#>  P abind            1.4-8    2024-09-12 [?] RSPM (R 4.5.0)
#>  P backports        1.5.0    2024-05-23 [?] RSPM (R 4.5.0)
#>  P bayesplot      * 1.14.0   2025-08-31 [?] RSPM (R 4.5.0)
#>  P boot             1.3-32   2025-08-29 [?] RSPM (R 4.5.0)
#>  P bridgesampling   1.1-2    2021-04-16 [?] RSPM (R 4.5.0)
#>  P brms           * 2.23.0   2025-09-09 [?] RSPM (R 4.5.0)
#>  P Brobdingnag      1.2-9    2022-10-19 [?] RSPM (R 4.5.0)
#>  P cachem           1.1.0    2024-05-16 [?] RSPM (R 4.5.0)
#>  P checkmate        2.3.3    2025-08-18 [?] RSPM (R 4.5.0)
#>  P cli              3.6.5    2025-04-23 [?] RSPM (R 4.5.0)
#>  P cmdstanr       * 0.9.0    2025-03-30 [?] repository (https://github.com/stan-dev/cmdstanr@da99e2b)
#>  P coda             0.19-4.1 2024-01-31 [?] RSPM (R 4.5.0)
#>    codetools        0.2-20   2024-03-31 [3] CRAN (R 4.5.1)
#>  P data.table       1.17.8   2025-07-10 [?] RSPM (R 4.5.0)
#>    devtools         2.4.6    2025-10-03 [2] RSPM (R 4.5.0)
#>  P digest           0.6.37   2024-08-19 [?] RSPM (R 4.5.0)
#>  P distributional   0.5.0    2024-09-17 [?] RSPM (R 4.5.0)
#>  P dplyr          * 1.1.4    2023-11-17 [?] RSPM (R 4.5.0)
#>    ellipsis         0.3.2    2021-04-29 [2] RSPM (R 4.5.0)
#>  P evaluate         1.0.5    2025-08-27 [?] RSPM (R 4.5.0)
#>  P farver           2.1.2    2024-05-13 [?] RSPM (R 4.5.0)
#>  P fastmap          1.2.0    2024-05-15 [?] RSPM (R 4.5.0)
#>  P forcats        * 1.0.1    2025-09-25 [?] RSPM (R 4.5.0)
#>  P fs               1.6.6    2025-04-12 [?] RSPM (R 4.5.0)
#>  P generics         0.1.4    2025-05-09 [?] RSPM (R 4.5.0)
#>  P ggplot2        * 4.0.0    2025-09-11 [?] RSPM (R 4.5.0)
#>  P glue             1.8.0    2024-09-30 [?] RSPM (R 4.5.0)
#>  P gridExtra        2.3      2017-09-09 [?] RSPM (R 4.5.0)
#>  P gtable           0.3.6    2024-10-25 [?] RSPM (R 4.5.0)
#>  P hms              1.1.4    2025-10-17 [?] RSPM (R 4.5.0)
#>  P htmltools        0.5.8.1  2024-04-04 [?] RSPM (R 4.5.0)
#>    htmlwidgets      1.6.4    2023-12-06 [2] RSPM (R 4.5.0)
#>  P inline           0.3.21   2025-01-09 [?] RSPM (R 4.5.0)
#>  P isoband          0.2.7    2022-12-20 [?] RSPM (R 4.5.0)
#>  P jsonlite         2.0.0    2025-03-27 [?] RSPM (R 4.5.0)
#>  P kableExtra       1.4.0    2024-01-24 [?] RSPM (R 4.5.0)
#>  P knitr          * 1.50     2025-03-16 [?] RSPM (R 4.5.0)
#>  P labeling         0.4.3    2023-08-29 [?] RSPM (R 4.5.0)
#>    lattice          0.22-7   2025-04-02 [3] CRAN (R 4.5.1)
#>  P lifecycle        1.0.4    2023-11-07 [?] RSPM (R 4.5.0)
#>  P lme4             1.1-37   2025-03-26 [?] RSPM (R 4.5.0)
#>  P loo              2.8.0    2024-07-03 [?] RSPM (R 4.5.0)
#>  P lubridate      * 1.9.4    2024-12-08 [?] RSPM (R 4.5.0)
#>  P magrittr         2.0.4    2025-09-12 [?] RSPM (R 4.5.0)
#>    MASS             7.3-65   2025-02-28 [3] CRAN (R 4.5.1)
#>  P Matrix           1.7-4    2025-08-28 [?] RSPM (R 4.5.0)
#>  P matrixStats      1.5.0    2025-01-07 [?] RSPM (R 4.5.0)
#>  P memoise          2.0.1    2021-11-26 [?] RSPM (R 4.5.0)
#>    mgcv             1.9-3    2025-04-04 [3] CRAN (R 4.5.1)
#>  P minqa            1.2.8    2024-08-17 [?] RSPM (R 4.5.0)
#>  P mvtnorm          1.3-3    2025-01-10 [?] RSPM (R 4.5.0)
#>    nlme             3.1-168  2025-03-31 [3] CRAN (R 4.5.1)
#>  P nloptr           2.2.1    2025-03-17 [?] RSPM (R 4.5.0)
#>  P patchwork      * 1.3.2    2025-08-25 [?] RSPM (R 4.5.0)
#>  P pillar           1.11.1   2025-09-17 [?] RSPM (R 4.5.0)
#>  P pkgbuild         1.4.8    2025-05-26 [?] RSPM (R 4.5.0)
#>  P pkgconfig        2.0.3    2019-09-22 [?] RSPM (R 4.5.0)
#>    pkgload          1.4.1    2025-09-23 [2] RSPM (R 4.5.0)
#>  P plyr             1.8.9    2023-10-02 [?] RSPM (R 4.5.0)
#>  P posterior        1.6.1    2025-02-27 [?] RSPM (R 4.5.0)
#>  P processx         3.8.6    2025-02-21 [?] RSPM (R 4.5.0)
#>  P ps               1.9.1    2025-04-12 [?] RSPM (R 4.5.0)
#>  P purrr          * 1.1.0    2025-07-10 [?] RSPM (R 4.5.0)
#>  P QuickJSR         1.8.1    2025-09-20 [?] RSPM (R 4.5.0)
#>  P R6               2.6.1    2025-02-15 [?] RSPM (R 4.5.0)
#>  P rbibutils        2.3      2024-10-04 [?] RSPM (R 4.5.0)
#>  P RColorBrewer     1.1-3    2022-04-03 [?] RSPM (R 4.5.0)
#>  P Rcpp           * 1.1.0    2025-07-02 [?] RSPM (R 4.5.0)
#>  P RcppParallel     5.1.11-1 2025-08-27 [?] RSPM (R 4.5.0)
#>  P Rdpack           2.6.4    2025-04-09 [?] RSPM (R 4.5.0)
#>  P readr          * 2.1.5    2024-01-10 [?] RSPM (R 4.5.0)
#>  P reformulas       0.4.1    2025-04-30 [?] RSPM (R 4.5.0)
#>    remotes          2.5.0    2024-03-17 [2] RSPM (R 4.5.0)
#>    renv             1.1.5    2025-07-24 [1] RSPM (R 4.5.0)
#>  P reshape2         1.4.4    2020-04-09 [?] RSPM (R 4.5.0)
#>  P rlang            1.1.6    2025-04-11 [?] RSPM (R 4.5.0)
#>  P rmarkdown        2.30     2025-09-28 [?] RSPM (R 4.5.0)
#>  P rstan            2.32.7   2025-03-10 [?] RSPM (R 4.5.0)
#>  P rstantools       2.5.0    2025-09-01 [?] RSPM (R 4.5.0)
#>  P rstudioapi       0.17.1   2024-10-22 [?] RSPM (R 4.5.0)
#>  P S7               0.2.0    2024-11-07 [?] RSPM (R 4.5.0)
#>  P scales           1.4.0    2025-04-24 [?] RSPM (R 4.5.0)
#>    sessioninfo      1.2.3    2025-02-05 [2] RSPM (R 4.5.0)
#>  P StanHeaders      2.32.10  2024-07-15 [?] RSPM (R 4.5.0)
#>  P stringi          1.8.7    2025-03-27 [?] RSPM (R 4.5.0)
#>  P stringr        * 1.5.2    2025-09-08 [?] RSPM (R 4.5.0)
#>  P svglite          2.2.1    2025-05-12 [?] RSPM (R 4.5.0)
#>  P systemfonts      1.3.1    2025-10-01 [?] RSPM (R 4.5.0)
#>  P tensorA          0.36.2.1 2023-12-13 [?] RSPM (R 4.5.0)
#>  P textshaping      1.0.4    2025-10-10 [?] RSPM (R 4.5.0)
#>  P tibble         * 3.3.0    2025-06-08 [?] RSPM (R 4.5.0)
#>  P tictoc         * 1.2.1    2024-03-18 [?] RSPM (R 4.5.0)
#>  P tidyr          * 1.3.1    2024-01-24 [?] RSPM (R 4.5.0)
#>  P tidyselect       1.2.1    2024-03-11 [?] RSPM (R 4.5.0)
#>  P tidyverse      * 2.0.0    2023-02-22 [?] RSPM (R 4.5.0)
#>  P timechange       0.3.0    2024-01-18 [?] RSPM (R 4.5.0)
#>  P tzdb             0.5.0    2025-03-15 [?] RSPM (R 4.5.0)
#>    usethis          3.2.1    2025-09-06 [2] RSPM (R 4.5.0)
#>  P utf8             1.2.6    2025-06-08 [?] RSPM (R 4.5.0)
#>  P vctrs            0.6.5    2023-12-01 [?] RSPM (R 4.5.0)
#>  P viridisLite      0.4.2    2023-05-02 [?] RSPM (R 4.5.0)
#>  P withr            3.0.2    2024-10-28 [?] RSPM (R 4.5.0)
#>  P xfun             0.53     2025-08-19 [?] RSPM (R 4.5.0)
#>  P xml2             1.4.1    2025-10-27 [?] RSPM (R 4.5.0)
#>  P yaml             2.3.10   2024-07-26 [?] RSPM (R 4.5.0)
#> 
#>  [1] /workspaces/afec-bayes-2025/renv/library/linux-ubuntu-noble/R-4.5/x86_64-pc-linux-gnu
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/local/lib/R/library
#> 
#>  * ── Packages attached to the search path.
#>  P ── Loaded and on-disk path mismatch.
#> 
#> ──────────────────────────────────────────────────────────────────────────────